Allow supplemental API data to be passed to ColdfrontFetchProcessor#282
Allow supplemental API data to be passed to ColdfrontFetchProcessor#282QuanMPhm wants to merge 1 commit into
Conversation
| invoice.PI_FIELD: pi_name, | ||
| invoice.INSTITUTION_ID_FIELD: "N/A", | ||
| invoice.CLUSTER_NAME_FIELD: cluster_name, | ||
| invoice.IS_COURSE_FIELD: False, # (TODO) Quan Assuming supplemental data does not contain course info? |
There was a problem hiding this comment.
There was a problem hiding this comment.
What do you mean by "Assuming supplemental data does not contain course info?"
There was a problem hiding this comment.
@joachimweyl The supplemental data's purpose is to provide data that the Coldfront API would normally have (an allocation's name, PI, whether it belongs to a course). The current supplemental data file does not currently have a column to indicate if a project belongs in a course. I wanted to ask if we want to assume projects listed in this file can be assumed to never be in courses.
@joachimweyl Adding the extra column to indicate course-membership is simple.
There was a problem hiding this comment.
Please check the supplemental data file against the original; it looks like it is out of date.
I don't know that we can never assume they are courses, but so far, none of them have been. I would say it is worth adding the column just in case.
| ) | ||
| in_time_range_mask = supplemental_df.apply( | ||
| lambda row: _is_in_time_range( | ||
| row[SUPPLEMENTAL_START_DATE], row[SUPPLEMENTAL_END_DATE] |
There was a problem hiding this comment.
the supplemental data as of now[1] has no start and end date, so this would just error out.
What is the behaviour supposed to be when no start or end date is found? Erroring out? logging? or assume the data is applicable (cc: @joachimweyl)
[1] https://github.com/CCI-MOC/invoicing-private-data/blob/main/project_api_data.csv
There was a problem hiding this comment.
Log, and we should add dates. We will need to work with RH to obtain dates. The template is supposed to be gathering this but all of these projects are pre the template
|
@QuanMPhm quick question. Why not reuse the same logic as the ColdFront API data loading via JSON (it could be extended YAML too)? |
@knikolla There's nothing strictly preventing us from doing that. It would require converting the current supplemental data file from CSV to YAML. Is that fine to do? |
@QuanMPhm I'd rather do a one time conversion of the CSV to YAML than maintain two separate data formats. I would suggest trying to mimic the ColdFront API output as closely as possible within the JSON. |
knikolla
left a comment
There was a problem hiding this comment.
Two comments, otherwise looks pretty good :)
|
|
||
| if invoice_settings.keycloak_client_id: | ||
| logger.info("Loading Coldfront data from remote server") | ||
| api_data += self._fetch_coldfront_allocation_api() |
There was a problem hiding this comment.
This now introduces the possibility of an Allocated Project ID being matched to several instances in the ColdFront data. Can you please write a test that documents the behavior in such a scenario?
There was a problem hiding this comment.
From 2:30 discussion today, I'll keep the behavior as error out and write the test for it
| # pydantic_settings parses complex types as JSON-encoded strings: https://pydantic.dev/docs/validation/latest/concepts/pydantic_settings/#parsing-environment-variable-values | ||
| env["COLDFRONT_API_FILEPATHS"] = json.dumps( | ||
| ( | ||
| str(test_files["test_coldfront_api_data.json"]), |
There was a problem hiding this comment.
In the file parsing code, you actually only call yaml.safe_load so you don't actually support JSON files anymore. Please reintroduce logic to check the file extension and call the appropriate loading mechanism for JSON or YAML.
There was a problem hiding this comment.
When I saw the e2e test pass, I realized yaml.safe_load() can load JSON files as well. Do you still want separate code paths for JSON and YAML loading?
json_data = '{"name": "Alice", "age": 30, "city": "New York"}'
parsed_data = yaml.safe_load(json_data)
# parsed_data is {'name': 'Alice', 'age': 30, 'city': 'New York'}The logic for what API data files are fetched has changed slightly. If local API data files are provided through `COLDFRONT_API_FILEPATHS`, the processor loads all of them. Seperately, if `keycloak_client_id` is set, also fetches from remote Coldfront server Added unit test and updated e2e test
Closes #239. Related to https://github.com/CCI-MOC/invoicing-private-data/pull/68. This allows passing data for billable projects not currently in Coldfront Namely bare metal projects.
This will finally allow billing of Bare Metal projects